DROPS

Document

Integrating HPC, AI, and Workflows for Scientific Data Analysis (Dagstuhl Seminar 23352)

Authors: Rosa M. Badia, Laure Berti-Equille, Rafael Ferreira da Silva, and Ulf Leser

Published in: Dagstuhl Reports, Volume 13, Issue 8 (2024)

Abstract

The Dagstuhl Seminar 23352, titled "Integrating HPC, AI, and Workflows for Scientific Data Analysis," held from August 27 to September 1, 2023, was a significant event focusing on the synergy between High-Performance Computing (HPC), Artificial Intelligence (AI), and scientific workflow technologies. The seminar recognized that modern Big Data analysis in science rests on three pillars: workflow technologies for reproducibility and steering, AI and Machine Learning (ML) for versatile analysis, and HPC for handling large data sets. These elements, while crucial, have traditionally been researched separately, leading to gaps in their integration. The seminar aimed to bridge these gaps, acknowledging the challenges and opportunities at the intersection of these technologies. The event highlighted the complex interplay between HPC, workflows, and ML, noting how ML has increasingly been integrated into scientific workflows, thereby enhancing resource demands and bringing new requirements to HPC architectures, like support for GPUs and iterative computations. The seminar also addressed the challenges in adapting HPC for large-scale ML tasks, including in areas like deep learning, and the need for workflow systems to evolve to leverage ML in data analysis fully. Moreover, the seminar explored how ML could optimize scientific workflow systems and HPC operations, such as through improved scheduling and fault tolerance. A key focus was on identifying prestigious use cases of ML in HPC and understanding their unique, unmet requirements. The stochastic nature of ML and its impact on the reproducibility of data analysis on HPC systems was also a topic of discussion.

Cite as

Rosa M. Badia, Laure Berti-Equille, Rafael Ferreira da Silva, and Ulf Leser. Integrating HPC, AI, and Workflows for Scientific Data Analysis (Dagstuhl Seminar 23352). In Dagstuhl Reports, Volume 13, Issue 8, pp. 129-164, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2024)

Copy BibTex To Clipboard

@Article{badia_et_al:DagRep.13.8.129,
  author =	{Badia, Rosa M. and Berti-Equille, Laure and da Silva, Rafael Ferreira and Leser, Ulf},
  title =	{{Integrating HPC, AI, and Workflows for Scientific Data Analysis (Dagstuhl Seminar 23352)}},
  pages =	{129--164},
  journal =	{Dagstuhl Reports},
  ISSN =	{2192-5283},
  year =	{2024},
  volume =	{13},
  number =	{8},
  editor =	{Badia, Rosa M. and Berti-Equille, Laure and da Silva, Rafael Ferreira and Leser, Ulf},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/DagRep.13.8.129},
  URN =		{urn:nbn:de:0030-drops-198162},
  doi =		{10.4230/DagRep.13.8.129},
  annote =	{Keywords: Large scale data presentation and analysis, Exascale class machine optimization, Performance data analysis and root cause detection, High dimensional data representation}
}

Document

DOI: 10.4230/DagSemProc.08131.1

08131 Executive Summary – Ontologies and Text Mining for Life Sciences : Current Status and Future Perspectives

Authors: Michael Ashburner, Ulf Leser, and Dietrich Rebholz-Schuhmann

Published in: Dagstuhl Seminar Proceedings, Volume 8131, Ontologies and Text Mining for Life Sciences : Current Status and Future Perspectives (2008)

Abstract

Researchers in Text Mining and researchers active in developing ontological resources provide solutions to preserve semantic information properly, i.e. in ontologies and/or fact databases. Researchers from both fields tend to work independently from each other, but there is a shared interest to profit from ongoing research in the complementary domain. The relatedness of both domains has led to the idea to organize a workshop that brings together members of both research domains.

Cite as

Michael Ashburner, Ulf Leser, and Dietrich Rebholz-Schuhmann. 08131 Executive Summary – Ontologies and Text Mining for Life Sciences : Current Status and Future Perspectives. In Ontologies and Text Mining for Life Sciences : Current Status and Future Perspectives. Dagstuhl Seminar Proceedings, Volume 8131, pp. 1-5, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2008)

Copy BibTex To Clipboard

@InProceedings{ashburner_et_al:DagSemProc.08131.1,
  author =	{Ashburner, Michael and Leser, Ulf and Rebholz-Schuhmann, Dietrich},
  title =	{{08131 Executive Summary – Ontologies and Text Mining for Life Sciences : Current Status and Future Perspectives}},
  booktitle =	{Ontologies and Text Mining for Life Sciences : Current Status and Future Perspectives},
  pages =	{1--5},
  series =	{Dagstuhl Seminar Proceedings (DagSemProc)},
  ISSN =	{1862-4405},
  year =	{2008},
  volume =	{8131},
  editor =	{Michael Ashburner and Ulf Leser and Dietrich Rebholz-Schuhmann},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/DagSemProc.08131.1},
  URN =		{urn:nbn:de:0030-drops-15234},
  doi =		{10.4230/DagSemProc.08131.1},
  annote =	{Keywords: Text Mining, natural language processing, ontologies, ontology design, machine learning, bioinformatics, medical informatics, knowledge management}
}

@InProceedings{ashburner_et_al:DagSemProc.08131.1,
  author =	{Ashburner, Michael and Leser, Ulf and Rebholz-Schuhmann, Dietrich},
  title =	{{08131 Executive Summary – Ontologies and Text Mining for Life Sciences : Current Status and Future Perspectives}},
  booktitle =	{Ontologies and Text Mining for Life Sciences : Current Status and Future Perspectives},
  pages =	{1--5},
  series =	{Dagstuhl Seminar Proceedings (DagSemProc)},
  ISSN =	{1862-4405},
  year =	{2008},
  volume =	{8131},
  editor =	{Michael Ashburner and Ulf Leser and Dietrich Rebholz-Schuhmann},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/DagSemProc.08131.1},
  URN =		{urn:nbn:de:0030-drops-15234},
  doi =		{10.4230/DagSemProc.08131.1},
  annote =	{Keywords: Text Mining, natural language processing, ontologies, ontology design, machine learning, bioinformatics, medical informatics, knowledge management}
}

Document

DOI: 10.4230/DagSemProc.08131.8

Mining Phenotypes for Protein Function Prediction

Authors: Ulf Leser, Philip Groth, Bertram Weiss, and Hans-Dieter Pohlenz

Published in: Dagstuhl Seminar Proceedings, Volume 8131, Ontologies and Text Mining for Life Sciences : Current Status and Future Perspectives (2008)

Abstract

Until very recently, phenotypes only very rarely were studied in a systematic manner. While ontologies for describing gene functions now have a 10 year long tradition, similar vocabularies for describing the phenotype of genes are only emerging now; similarly, the techniques for determining phenotypes on a large scale (especially RNAi) are available only for a few years, while genomic sequencing or gene expression studies are already established for a much longer time. In this talk, we describe results from a study for exploiting phenotype descriptions for protein function prediction. We used the data from PhenomicsDB, a phenotype database integrated from several publicly available data sources. Due to the lack of standardization, phenotypes in PhenomicsDB can only be viewed as text (short statements, abstracts, singular terms, ...). We clustered these texts and analyzed the corresponding gene clusters in terms of their coherence in functional annotation and their interconnectedness by protein-protein-interactions. We also devised a method for using the close similarity in their phenotype descriptions to predict the function of proteins. We show that this methods yields a very good precision at acceptable coverage.

Cite as

Ulf Leser, Philip Groth, Bertram Weiss, and Hans-Dieter Pohlenz. Mining Phenotypes for Protein Function Prediction. In Ontologies and Text Mining for Life Sciences : Current Status and Future Perspectives. Dagstuhl Seminar Proceedings, Volume 8131, p. 1, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2008)

Copy BibTex To Clipboard

@InProceedings{leser_et_al:DagSemProc.08131.8,
  author =	{Leser, Ulf and Groth, Philip and Weiss, Bertram and Pohlenz, Hans-Dieter},
  title =	{{Mining Phenotypes for Protein Function Prediction}},
  booktitle =	{Ontologies and Text Mining for Life Sciences : Current Status and Future Perspectives},
  pages =	{1--1},
  series =	{Dagstuhl Seminar Proceedings (DagSemProc)},
  ISSN =	{1862-4405},
  year =	{2008},
  volume =	{8131},
  editor =	{Michael Ashburner and Ulf Leser and Dietrich Rebholz-Schuhmann},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/DagSemProc.08131.8},
  URN =		{urn:nbn:de:0030-drops-15133},
  doi =		{10.4230/DagSemProc.08131.8},
  annote =	{Keywords: Data mining, funciton prediction, bioinformatics, phenotypes, text mining}
}

Document

DOI: 10.4230/DagSemProc.04292.1

Data Mining: The Next Generation

Authors: Raghu Ramakrishnan, Rakesh Agrawal, Johann-Christoph Freytag, Toni Bollinger, Christopher W. Clifton, Saso Dzeroski, Jochen Hipp, Daniel Keim, Stefan Kramer, Hans-Peter Kriegel, Ulf Leser, Bing Liu, Heikki Mannila, Rosa Meo, Shinichi Morishita, Raymond Ng, Jian Pei, Prabhakar Raghavan, Myra Spiliopoulou, Jaideep Srivastava, and Vicenc Torra

Published in: Dagstuhl Seminar Proceedings, Volume 4292, Perspectives Workshop: Data Mining: The Next Generation (2005)

Abstract

Data Mining (DM) has enjoyed great popularity in recent years, with advances in both research and commercialization. The first generation of DM research and development has yielded several commercially available systems, both stand-alone and integrated with database systems; produced scalable versions of algorithms for many classical DM problems; and introduced novel pattern discovery problems. In recent years, research has tended to be fragmented into several distinct pockets without a comprehensive framework. Researchers have continued to work largely within the parameters of their parent disciplines, building upon existing and distinct research methodologies. Even when they address a common problem (for example, how to cluster a dataset) they apply different techniques, different perspectives on what the important issues are, and different evaluation criteria. While different approaches can be complementary, and such a diversity is ultimately a strength of the field, better communication across disciplines is required if DM is to forge a distinct identity with a core set of principles, perspectives, and challenges that differentiate it from each of the parent disciplines. Further, while the amount and complexity of data continues to grow rapidly, and the task of distilling useful insight continues to be central, serious concerns have emerged about social implications of DM. Addressing these concerns will require advances in our theoretical understanding of the principles that underlie DM algorithms, as well as an integrated approach to security and privacy in all phases of data management and analysis. Researchers from a variety of backgrounds assembled at Dagstuhl to re-assess the current directions of the field, to identify critical problems that require attention, and to discuss ways to increase the flow of ideas across the different disciplines that DM has brought together. The workshop did not seek to draw up an agenda for the field of DM. Rather, it offers the participants’ perspective on two technical directions – compositionality and privacy – and describes some important application challenges that drove the discussion. Both of these directions illustrate the opportunities for crossdisciplinary research, and there was broad agreement that they represent important and timely areas for further work; of course, the choice of these directions as topics for discussion also reflects the personal interests and biases of the workshop participants.

Cite as

Raghu Ramakrishnan, Rakesh Agrawal, Johann-Christoph Freytag, Toni Bollinger, Christopher W. Clifton, Saso Dzeroski, Jochen Hipp, Daniel Keim, Stefan Kramer, Hans-Peter Kriegel, Ulf Leser, Bing Liu, Heikki Mannila, Rosa Meo, Shinichi Morishita, Raymond Ng, Jian Pei, Prabhakar Raghavan, Myra Spiliopoulou, Jaideep Srivastava, and Vicenc Torra. Data Mining: The Next Generation. In Perspectives Workshop: Data Mining: The Next Generation. Dagstuhl Seminar Proceedings, Volume 4292, pp. 1-33, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2005)

Copy BibTex To Clipboard

@InProceedings{ramakrishnan_et_al:DagSemProc.04292.1,
  author =	{Ramakrishnan, Raghu and Agrawal, Rakesh and Freytag, Johann-Christoph and Bollinger, Toni and Clifton, Christopher W. and Dzeroski, Saso and Hipp, Jochen and Keim, Daniel and Kramer, Stefan and Kriegel, Hans-Peter and Leser, Ulf and Liu, Bing and Mannila, Heikki and Meo, Rosa and Morishita, Shinichi and Ng, Raymond and Pei, Jian and Raghavan, Prabhakar and Spiliopoulou, Myra and Srivastava, Jaideep and Torra, Vicenc},
  title =	{{Data Mining: The Next Generation}},
  booktitle =	{Perspectives Workshop: Data Mining: The Next Generation},
  pages =	{1--33},
  series =	{Dagstuhl Seminar Proceedings (DagSemProc)},
  ISSN =	{1862-4405},
  year =	{2005},
  volume =	{4292},
  editor =	{Rakesh Agrawal and Johann Christoph Freytag and Raghu Ramakrishnan},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/DagSemProc.04292.1},
  URN =		{urn:nbn:de:0030-drops-2709},
  doi =		{10.4230/DagSemProc.04292.1},
  annote =	{Keywords: Data mining, databases, artificial intelligence, machine learning, statistics, semantics}
}

@InProceedings{ramakrishnan_et_al:DagSemProc.04292.1,
  author =	{Ramakrishnan, Raghu and Agrawal, Rakesh and Freytag, Johann-Christoph and Bollinger, Toni and Clifton, Christopher W. and Dzeroski, Saso and Hipp, Jochen and Keim, Daniel and Kramer, Stefan and Kriegel, Hans-Peter and Leser, Ulf and Liu, Bing and Mannila, Heikki and Meo, Rosa and Morishita, Shinichi and Ng, Raymond and Pei, Jian and Raghavan, Prabhakar and Spiliopoulou, Myra and Srivastava, Jaideep and Torra, Vicenc},
  title =	{{Data Mining: The Next Generation}},
  booktitle =	{Perspectives Workshop: Data Mining: The Next Generation},
  pages =	{1--33},
  series =	{Dagstuhl Seminar Proceedings (DagSemProc)},
  ISSN =	{1862-4405},
  year =	{2005},
  volume =	{4292},
  editor =	{Rakesh Agrawal and Johann Christoph Freytag and Raghu Ramakrishnan},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/DagSemProc.04292.1},
  URN =		{urn:nbn:de:0030-drops-2709},
  doi =		{10.4230/DagSemProc.04292.1},
  annote =	{Keywords: Data mining, databases, artificial intelligence, machine learning, statistics, semantics}
}

Search Results

Documents authored by Leser, Ulf

Integrating HPC, AI, and Workflows for Scientific Data Analysis (Dagstuhl Seminar 23352)

Abstract

Cite as

08131 Executive Summary – Ontologies and Text Mining for Life Sciences : Current Status and Future Perspectives

Abstract

Cite as

Mining Phenotypes for Protein Function Prediction

Abstract

Cite as

Data Mining: The Next Generation

Abstract

Cite as

Thanks for your feedback!

Could not send message